Distilling Information from Text: The EDS TemplateFiller System

نویسندگان

  • H. Kelly Shuldberg
  • Melissa Macpherson
  • Pete Humphrey
  • Janii Corley
چکیده

A system is described which digests large volumes of text, filtering out irrelevant articles and distilling the remainder into templates that represent information from the articles in simple slot/filler pairs. The system is highly modular in that it consists of a series of programs, each of which contributes information to the text to help in the final analysis of determining which strings constitute valid values for the slots in the template. This modular design has the dual advantage of allowing relatively easy debugging and of permitting many of the component programs to participate in other projects. The system is customized to specific domains, taking advantage of simple string matching techniques to improve the effectiveness of more complex sentence-level semantic processes. The extension to new domains has been facilitated by dividing system data files into generic vs. specific categories; domain extension requires the creation of only the domain-specific files.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia

BACKGROUND In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated syste...

متن کامل

Relational Recognition for Information Extraction in Free Text Documents

Information extraction tools provide an important means for distilling content from free text documents, and knowledgebased tools provide an important means for automatically reasoning over statements expressed as well-formed tuples. A number of techniques deliver reliable extraction of entities, less reliable extraction of relations, and poor extraction on entity-entity-relation tuples. Howeve...

متن کامل

KELVIN: a tool for automated knowledge base construction

We present KELVIN, an automated system for processing a large text corpus and distilling a knowledge base about persons, organizations, and locations. We have tested the KELVIN system on several corpora, including: (a) the TAC KBP 2012 Cold Start corpus which consists of public Web pages from the University of Pennsylvania, and (b) a subset of 26k news articles taken from English Gigaword 5th e...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIS

دوره 44  شماره 

صفحات  -

تاریخ انتشار 1993